library(genovisr)
#> Warning: replacing previous import 'ggplot2::last_plot' by 'plotly::last_plot'
#> when loading 'genovisr'
#> Warning: replacing previous import 'shiny::runExample' by 'shinyjs::runExample'
#> when loading 'genovisr'
library(GBScleanR)
#> Loading required package: SeqArray
#> Loading required package: gdsfmt
library(plotly)
#> Loading required package: ggplot2
#>
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#>
#> last_plot
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following object is masked from 'package:graphics':
#>
#> layout
library(Biostrings)
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> Loading required package: stats4
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:plotly':
#>
#> rename
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#>
#> Attaching package: 'IRanges'
#> The following object is masked from 'package:plotly':
#>
#> slice
#> Loading required package: XVector
#> Loading required package: GenomeInfoDb
#>
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#>
#> strsplit
Create dummy data to walk through the package with.
object <- simGenovis()
The marker_info slot of the genovis class object stores the position
and allele information of the genotyping markers. The id
column shows marker IDs that have been created by concatenating the
chromosome IDs and physical positions of markers.
# Marker information
head(object$marker_info)
#> id chr pos ref_allele alt_allele
#> 1 1_9518 1 9518 G A
#> 2 1_50407 1 50407 G A
#> 3 1_191311 1 191311 G A
#> 4 1_199996 1 199996 G A
#> 5 1_205100 1 205100 G A
#> 6 1_247607 1 247607 G A
The sample_info slot stores sample IDs. If your data was created by
merging two genovis class objects with the pop argument,
this slot also contains the pop column to indicate which
populations the samples belong to.
# Sample information
head(object$sample_info)
#> id
#> 1 sample_1
#> 2 sample_2
#> 3 sample_3
#> 4 sample_4
#> 5 sample_5
#> 6 sample_6
Genotype calls created by variant callers, e.g. samtools and GATK, are stored in the genotype slot as an N x M matrix, where N and M are the number of samples and markers, respectively.
# Genotype data
object$genotype[1:5, 1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 1 1 1 0
#> [2,] 0 1 1 1 0
#> [3,] 0 0 2 2 0
#> [4,] 0 1 1 1 0
#> [5,] 0 1 1 1 0
Estimated haplotype data created by GBScleanR are stored in the haplotype slot as a P x N x M array, where P, N, and M are the number of ploidy, samples, and markers.
# Haplotype data
object$haplotype[, 1:5, 1:5]
#> , , 1
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 2 2 2
#> [2,] 2 2 2 1 1
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 2 2 2
#> [2,] 2 2 2 1 1
#>
#> , , 3
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 2 2 2
#> [2,] 2 2 2 1 1
#>
#> , , 4
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 2 2 2
#> [2,] 2 2 2 1 1
#>
#> , , 5
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 2 2 2
#> [2,] 2 2 2 1 1
The dosage slot stores dosage data calculated from the haplotype
data. The dosage data will be provided only if the given population
shows segregation of two haplotypes, e.g. a biparental population
derived from two inbred founders. The integer of the [i,j]
element of the dosage data matrix indicates the number of the Hap2
allele at the jth marker of the ith
sample.
# Dosage data
object$dosage[1:5, 1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 1 1 1
#> [2,] 1 1 1 1 1
#> [3,] 2 2 2 2 2
#> [4,] 1 1 1 1 1
#> [5,] 1 1 1 1 1
The genovis class object can be constructed directly from a GDS file
created by the GBScleanR package. The GBScleanR package provides
functions to convert Variant Call Format (VCF) files to Genotype Data
Structure (GDS) files for quick access to large genotype data and
estimate haplotypes and dosages of genotype markers based on allelic
read counts using a hidden Markov model. The
gbscleanr2genovis() function extracts marker, sample, and
genotype information from a given GDS file. If the GDS file contains
haplotype and dosage information, these data are also included in the
output genovis class object.
object <- gbscleanr2genovis(gds_fn = "/path/to/your/gds_file.gds")
The genotype, haplotype, and dosage information in a genovis class object can be visualized using a variety of data visualization functions provided by the genovisr package.
Classical graphical genotype is a representation of genotypes by
colored tiles to indicate which genomic region has which genomic segment
descended from which parent. Therefore, the word “genotype” means
descended “haplotype” in this context nowadays. The
evalSegments() function summarizes breakpoints of
haplotypes and dosages in the given genomes.
object <- evalSegments(object = object)
The plotGraphGeno() function visualizes haplotypes and
dosages in a graphical representation based on the obtained breakpoint
information.
plotGraphGeno(object = object, direction = "h", data = "haplotype", sample = 1:5, width = 0.1)
plotGraphGeno(object = object, direction = "h", data = "dosage", sample = 1:10, width = 0.1)
plotGraphGeno(object = object, direction = "v", data = "haplotype", sample = 1, width = 0.15)
plotGraphGeno(object = object, direction = "v", data = "dosage", sample = 1, width = 0.1)
The plotly package can be used to make these plots interactive.
p <- plotGraphGeno(object = object, direction = "v", data = "haplotype", sample = 1, width = 0.15)
ggplotly(p)
p <- plotGraphGeno(object = object, direction = "v", data = "dosage", sample = 1, width = 0.1)
ggplotly(p)
We can also get the statistical summary of haplotype and dosage
information using the statsGeno() function. The function
requires a numeric vector indicating the lengths of chromosomes.
# If you have the reference genome FASTA, retrieve the chromosome lengths from the FASTA
# ref_genome <- readDNAStringSet(filepath = "genome.fa")
# ref_genome <- sort(ref_genome)
# chr_len <- width(ref_genome)
chr_len <- rep(42, 12) * 1e6
object <- statsGeno(object = object, chr_len = chr_len)
Visualize the obtained statistical summary using
plotStats().
plotStats(object = object, data = "haplotype", group = "hap", value = "class")
plotStats(object = object, data = "dosage", group = "sample", value = "segment")
The plotHist() function can be used to create histograms
of segment lengths for haplotype or dosage data.
plotHist(object = object, data = "haplotype", binwidth = 1e6, fill = "darkgreen")
plotHist(object = object, data = "dosage", binwidth = 1e6, fill = "blue", chrwise = TRUE)
plotHist(object = object, data = "haplotype", binwidth = 1e6, fill = "red", samplewise = TRUE, ncol = 3)
GenovisR also provides an interactive Shiny application for visualizing and analyzing genotypic data.
Use the shinyGenovir() function to launch the Shiny
application.
shinyGenovir()
This application allows users to load GDS files, set parent samples, and visualize data interactively through various plots and controls. The Shiny app integrates many of the functions described above into a user-friendly interface.
Once the Shiny application is running, you can load a GDS file by selecting it from your file system. After the file is loaded, you can set the parent samples and visualize the genotype, haplotype, and dosage information interactively.
# Example of loading a GDS file within the Shiny app
The Shiny app provides controls for tweaking histogram parameters, graphical genotype parameters, and statistical parameters, allowing for dynamic and interactive data exploration.
# Example of visualizing data within the Shiny app
By following this vignette, you should be able to effectively use the GenovisR package to visualize and analyze genotypic data, whether through R scripts or an interactive Shiny application.